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A  method  for  estimating  the  density  of  the  latent  trait  (or  ability)  in  an  item  response 
model  is  given  based  on  the  logspline  density  estimator  of  Stone  and  Koo.  In  the 
implementation  presented  here,  families  of  densities  whose  logarithms  are  quadratic 
splines  are  used  to  estimate  the  unknown  latent  trait  density.  The  number  of  knots 
in  the  spline  is  variable  permitting  arbitrary  densities  to  be  well  approximated  from 
the  logspline  family.  Because  the  family  is  exponential  and  contains  all  normal  dis¬ 
tributions,  the  likelihood  ratio  test  can  be  used  to  test  for  normality.  An  ad  hoc 
method  is  proposed  for  choosing  the  number  of  knots,  and  the  method  is  illustrated  with 
two  simulated  data  sets. 
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1.  INTRODUCTION 


The  purpose  of  this  report  is  to  describe  a  technique  for  estimating  the  latent  trait 
density  in  an  item  response  model.  In  this  study,  the  item  distributions  are  assumed 
known,  and  the  latent  traits  are  regarded  as  a  random  sample  from  an  unknown 
distribution  with  a  smooth  density. 

This  work  is  directly  related  to  the  Bayesian  models  of  Bock  and  Aitken  (1981)  and 
Rigdon  and  Tsutakawa  (1983).  Bock  and  Aitken  assumed  a  somewhat  crude  discrete  prior 
at  specified  ability  levels.  In  effect  they  used  a  histogram  for  the  estimator  of  the  latent 
trait  density.  In  the  parametric  empirical  Bayes  approach  of  Rigdon  and  Tsutakawa,  the 
prior  was  assumed  to  be  normal  and  parameter  estimates  were  obtained.  The  approach 
here  is  to  obtain  estimates  from  a  large  (nonstandard)  class  of  densities  in  which  the 
number  of  parameters  is  allowed  to  grow  with  the  sample  size.  This  is  analogous  to  the 
traditional  way  to  estimate  curves  of  unspecified  form  in  regression,  for  example.  The 
purpose  is  to  achieve  at  least  consistent  estimates  regardless  of  the  form  of  the  density. 
There  are  inherent  theoretical  difficulties  in  the  problem,  but  we  show  by  example  that 
good  estimates  are  possible. 

The  notation  here  follows  Rigdon  and  Tsutakawa.  Given  n  examinees  and  m  test 
items,  let  denote  the  vector  of  responses  of  the  ith  examinee.  Let  /?  denote  the  item 
parameters,  and  assume  y j  has  conditional  pdf 
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where  $.  is  a  real— valued  ability  (or  latent  trait)  parameter  of  the  ith  examinee.  The 
ability  pdiameters  9  =  (#-,,•••  ,9  )  are  assumed  to  be  drawn  at  random  from  an  unknown 

A.  II 

distribution  with  density  gg(0).  Thus  y  and  flhave  the  joint  pdf 

f(y,^  =  ;np(yi|^g0(^) 

A  -  A. 

=  p(y^/3)g0(^)- 


Since  0  is  unobservable,  inference  on  gg  must  be  based  on  the  marginal  likelihood 
1 1-1)  L(y|g)  =  /f(y,0|i))g(0)d0, 

The  latent  trait  density  g^,  a  prior  density  in  the  Bayesian  framework,  is  also  known  as  a 
mixing  density  in  this  situation. 

For  simplicity,  details  are  carried  out  here  for  the  Rasch  model  assuming  local 
independence.  In  principle  the  techniques  apply  as  well  to  other  parametric  item 
distributions.  We  assume  throughout  that  the  item  parameters  are  known.  Letting  y^  = 

( v i i ,  -  •  •  -yim)  denote  the  vector  of  observations  for  the  ith  subject,  the  Rasch  model 
specifies 

P(y,j  =  llOp/ij)  =  1  -  P(yu  =  OlOp/y  =  e"i  “^/(i  +  e°i  - 
Assuming  local  independence  and  using  the  sufficient  statistics  r  =  E^^yj.  (the  raw  score 
for  the  ith  subject)  and  qj  =  =1yjj  (the  total  number  of  correct  responses  on  the  jth 
item),  we  then  have 

n  m 

p(y|#,0  =  n  n  P(y..  = 

i=l  j=l  1J  1  J 


e£ri  9j  -  Vi 
n  n<i  +  e"i  ~8i) 


If  we  let 


p  (0)  = - ,  r  =  0,---,m, 

n(i  +  e^  “ 

j 

then  the  logarithm  of  (1.1)  is  given  by 

(1-2)  log(L(g))  = -Eqj/?j  +  S^nrlogjjpr(0)g(0)d0j, 

where  a  —  #{i  :  r,  =  r,  1  <  i  <  n}.  (The  nr  are  the  bin  frequencies  for  the  m  +  1  possible 
raw  scores.) 


There  is  a  fairly  large  literature  on  methods  for  estimating  a  mixing  distribution 


itself  without  assuming  the  existence  of  a  density  going  back  at  least  to  Deely  and  Kruse 
(1968).  It  is  known  that  the  unrestricted  marginal  maximum  likelihood  estimate  of  the 
mixing  distribution  is  discrete.  Laird  (1978)  used  the  EM  algorithm  to  find  the 
unrestricted  marginal  maximum  likelihood  estimate,  although  convergence  in  these 
problems  is  generally  quite  slow.  Padgett  and  Tsokos  (1979)  considered  mixing 
distribution  estimates  in  the  context  of  empirical  Bayes  estimation  similar  to  the  situation 
considered  here.  In  contrast,  although  extensive  work  has  been  done  on  density  estimation 
(see  for  example  Silverman  (1986)),  there  has  been  relatively  little  work  on  estimating  a 
mixing  density.  O'Bryan  and  Susarla  (1975)  proposed  a  Fourier— transform  method  in  the 
case  of  deconvolution  for  empirical  Bayes  problems,  but  their  method  appears  to  be 
computationally  intractable  in  the  general  setting  of  item  response  models.  Mendelsohn 
and  Rice  (1982)  obtained  good  results  by  deconvolving  a  smoothed  estimate  of  the 
marginal  density,  and  more  recently  Levine  and  Williams  (1987)  used  a  linear  estimate 
with  constraints  for  latent  trait  models.  Both  of  these  techniques  require  quadratic 
programming. 

A  different  approach  will  be  used  here.  In  Section  2  we  follow  Stone  and  Koo  (1986) 
and  introduce  an  exponential  family  of  densities  whose  logarithms  are  spline  functions 
parameterized  by  an  N— vector  <p  =  (^  •  •  •  ,^).  The  parameterization  will  be  variable  to 
allow  good  approximation  of  arbitrary  smooth  densities  with  enough  smoothing  to  stay 
away  from  the  discrete  marginal  mle  of  the  distribution,  and  estimates  will  be  obtained  by 
maximizing  the  marginal  likelihood  in  ip.  In  addition,  because  the  families  of  densities 
contain  all  normal  distributions,  the  likelihood  ratio  test  will  be  available  to  test  for 
normality. 

2.  LOG  SPLINE  DENSITIES 

In  order  to  estimate  densities  g(0)  of  unspecified  form,  we  consider  densities  of  the 


form  es^  where  s(0)  is  a  "spiinc  function"  as  defined  below.  For  further  information  on 
splines,  the  reader  is  referred  to  de  Boor  (1978).  Splines  are  known  to  have  excellent 
approximation  properties  for  large  classes  of  functions,  hence  log(gQ(0))  should  be  well 
approximated  by  a  spline  s(0).  In  addition,  the  log  spline  densities  are  automatically 
nonnegative,  simplifying  the  estimation  algorithm. 

For  computational  purposes,  we  truncate  the  support  of  g(  0)  to  a  bounded  interval  0 
£  [a,b],  where  — »  <  a  <  b  <  ®.  A  spline  function  s(0)  on  [a,b]  of  order  k  is  a  piecewise 
polynomial  of  order  k  (or  degree  k  —  1)  with  possible  discontinuities  in  the  function  or  its 
derivatives  at  a  finite  number  of  points  t  =  {t j ,  -  •  -  ,tu}  C  [a,b],  where  a  =  tj  <  t2  <•  •  •<  t 
=  b.  These  points  are  called  the  "knots"  of  s(0).  A  repeated  knot,  say  tj  =  tj  ,  1  =  •  •  •  = 

ti  H-  M _ 1’  's  sa^  t0  ^ave  M-  Each  knot  (including  multiplicity)  removes 

one  continuity  constraint  on  s{0)  and  its  derivatives,  so  that  s(0)  is  assumed  continuous 
with  (k  -  M  -  1)  continuous  derivatives  at  a  knot  t  when  t  has  multiplicity  M.  With  this 
notation,  a  function  s(0)  is  called  a  spline  of  order  k  if  it  satisfies  the  following  properties: 

(i)  s(0)  is  a  polynomial  of  degree  (k  -  1)  on  (t},tj  +  for  i  =  !,•••,  u  -  1. 

(ii)  If  tj  has  multiplicity  1VF,  s^(0)  is  continuous  at  tj  for  j  =  0,-  •  • ,  k  -  Mj  -  1. 
The  space  of  all  such  functions  will  be  denoted  by  Sk 

Because  t  is  linear  and  finite  dimensional,  it  has  a  basis.  Following  de  Boor 
(1978),  we  use  the  B— spline  basis,  a  basis  consisting  of  splines  {Bj(0;t):  i  =  1,-  •  -  ,u}  e  t 
which  have  a  minimal  support  property, 

(2-1)  Bs(0;t)  =  O,  **[t.,ti  +  k]. 

This  property  characterizes  the  basis  and  makes  it  especially  convenient  to  work  with. 

With  this  notation,  the  Curry  Schoenberg  theorem  states  that 

(2.2)  Sk  t  =  js(0):  s(0)  =  S^jBj(0;t),  realj. 

Now  let  t  denote  the  class  of  log  spline  densities  whose  exponents  are  in  Then  we 
have  explicitly 


Gk  t  =  |eS^iBi(^;tV/ eS^iBi(x;t)dx:  p.  realj 
From  this  representation  we  see  that  for  a  fixed  knot  set  t,  G^  t  is  actually  an  exponential 
family  with  natural  parameter  p.  Note  that  t  is  overparameterized  since  Sj,  t  contains 
all  constants,  hence  there  are  actually  N  —  1  free  parameters  here. 

The  underlying  assumption  of  the  analysis  in  many  item  response  applications  is 
that  g (0)  is  standard  normal.  Thus  it  is  reasonable  to  take  a  symmetric  interval  such  as 
[a,b]  =  [—3. 5, 3. 5]  for  the  support  set.  As  in  de  Boor  (1978,  Chapter  XII),  we  assume  knots 
of  multiplicity  k  at  the  boundaries  in  order  to  free  s (9)  from  boundary  conditions  but 
simple  knots  in  the  interior.  Throughout,  we  will  let 


tl  ~  h 


—  tk  -  a,  tN  —  tN  +  i 


=  t 


N  +  k 


=  b, 


T, 


(2.3a) 
and 

(2.3b)  tk  +  u  =  $_i(u/(N  — k  +  l)),u  =  1,---,N  — k  +  1. 

Here  $  is  the  cdf  for  the  standard  normal  distribution.  By  (2.1),  this  defines  N  B-splines 
of  order  k,  and  the  multiple  knots  at  a  and  b  imply  that  s(0)  has  no  constraints  at  either 
boundary.  There  is  no  theoretical  reason  for  the  interior  knot  placement  given  here,  but 
the  strategy  seems  reasonable  for  placing  ample  knots  in  the  region  where  gg(#)  might  be 
expected  to  have  interesting  features.  (In  general,  the  issue  of  optimal  knot  placement  is  a 
delicate  problem  which  seldom  has  satisfactory  closed  form  solution.)  For  this  knot  set  t, 


the  absence  of  boundary  conditions  implies  that  t  also  has  the  alternate  representation 

k  .  N— k 

(2.3)  s(0)  =  £  ~  1  +  S  Tu(0-tk  +  |1)“ 

j=l  J  u  =  l  u  k  +  u  + 


k  - 1 


k _ 1  k _ 1 

for  constants  o-j  and  7U,  where  (^  —  t)+  =  (0—t)  for  9  >  t,  and  0  otherwise.  Thus 


with  N  =  k,  it  is  clear  that  s(0)  is  a  polynomial  of  degree  k  —  1.  In  particular,  when  k  =  3. 


Sk  t  contains  all  quadratic  polynomials  and  G^  t  contains  all  normal  distributions. 

The  basis  for  inference  using  the  family  Gk  t  is  the  following. 

THEOREM.  If  gg(0)  -*  0  as  9  ~  ±oo  and  log(g)  is  continuously  differentiable,  then  for  any  e 
>  0.  there  exists  a  knot  set  t  and  a  density  g(0\tp)  e  G,  .  such  that 


/ 


sup  \gJ0)  -  g(0\<p)\  <  e. 

—00  <  9  <  00 

Proof.  One  can  choose  the  interval  [a,b]  such  that  |gg(0)|  <  e/2  for  9  g  [a,b].  Under  these 
conditions,  t  can  be  chosen  to  make  |s(0)  —  log(gg(0))|  uniformly  small  on  [a,b],  hence  the 
result. 


3.  MARGINAL  MAXIMUM  LIKELIHOOD 

From  (1.2),  the  marginal  mle  over  G^  ^  is  obtained  by  maximizing 

log(L(v?))  =  -Sqj,^  +  S onrlog| J* pf(  ^)g(  6>|  y?)d ^  . 

If  we  renormalize  by  dividing  by  n,  let  =  n  /n,  substitute  (2.2)  for  g{0\<p),  and  ignore 
terms  not  depending  on  <p,  the  marginal  mle  problem  is  to  maximize 

^gWrl°g{j*  Pr(^)e^i^i^’t^ci^} 

subject  to 

f  e£^iBi(^;t)d  9=  1. 

This  is  simplified  by  the  following  variant  of  a  result  of  Silverman  (1982). 

LEMMA.  If  w  >0  and  £  w  =  1,  the  maximum  value  of 
r  _  r 

(3.1)  K(s)  =  £  w  log j  fp  J9)es('^d0 

r=0  r  r 

over  s(0)  e  subject  to 

(3.2)  JV^W=1 

is  the  same  as  the  value  of  K(s)  at  the  unconditional  maximizer  over  t  of 

(3.3)  £  wrlog{/ pr(^)es(^d^}  -  / es(^W. 

* 

Proof.  For  s  e  t,  let  s  (9)  =  s{9)  -  logjjV^dxj,  so  that  J*es  ^d0=l.  Then 
£  wrlog|j* pr(#)es  ^dtfj  ~/es 


s 


,-o 


I 

& 

* 

•v-- 


11^ 

y. 


V, 

y 

■> 


£ 


m 
_  v 


r=0 

m 


^logjj* Pr(#)es^Wj  -  E^logjj es^  ^d#j  -  1 
=  ^^wrlog|j* pr(0)es^W  -  logjjV^WJ  -  1. 

But  1  +  log(t)  <  t  with  strict  inequality  for  t  4-  1,  so  the  last  term  is  bounded  below  by 

E^logjj* Pr(0)es^d#j  -  f  es^d(9, 

with  strict  inequality  unless  J* es^d0  =  1.  Thus  the  maximizer  of  (3.3),  say  s ,  ( ) ,  must 
satisfy  (3.2),  so  within  this  class  max  K(s)  >  K(s^).  But  for  any  s  satisfying  (3.2),  K(s^) 
1  >  K(s)  —  1  by  the  optimality  of  s^  since  (3.2)  holds  for  Sy  This  proves  the  lemma. 
REMARK.  The  lemma  shows  that  "one"  is  the  appropriate  Lagrange  multiplier  for  the 
problem. 

We  have  implemented  the  marginal  maximum  likelihood  using  Newton-Raphson 

iteration.  To  describe  the  algorithm,  let 

m 


l(<p)  =  E  w^logj J" p^e^i^V^Wj  -  f  e^i^i^W 

denote  the  function  to  be  maximized.  Recalling  that  g(0|v?)  =  e^i^i^’^,  we  define 
f  Pr(  ^)g(^l(/J)d^  =  Egpr 
and  for  notational  purposes  write 

jf- / Pr(<WMd»  =  /  Bu(ftt)Py)g(«|v.)d<l=  EgBuPr. 


With  similar  notation,  define 

i 

o 

U'  rv 

d 
’u 
and 


g^/pr(«)g(%)d«=EgBuBvpr, 

4-/«(«l*>)d»=EBu, 


vJ^fg(e\v)de=EgBnBvpi. 

Then  the  gradient  Vf(tp)  is  the  N— vector  with  components 

di^  =  E  w  (E  B  p  )/E  p  -  E  B  , 
r  r  g  u*V'  g^r  g  u’ 


9 


and  the  Hessian  matrix  FI(^)  is  the  N  *  N  matrix  whose  (u,v)th  element  is 

WM,  =  ?  Wr|(EgBuBv»r>EgPr  ~  <EgBuPrKEgBvPr)}/(EgPr)2  - 
Starting  with  initial  guess  the  Newton— Raphson  method  for  convergence  to  p  is  to 
iteratively  compute 


<p(l  +  ^ 

until  convergence  is  obtained.  The  algorithm  was  implemented  with  a  step  halving 
modification:  if  at  some  stage  ((ip^  +  <  ({<p^),  <//'  +  ^  was  replaced  by  <p(l  +  l>  = 

-  cH (<p^)  for  c  =  1/2,  1/4,-  •  •  until  improvement  resulted  or  until  c  = 

1/32.  The  starting  value  was  taken  to  satisfy  log(g(0|y/^))  =  s{$\p)  =  — 0^/2  — 
log(2x)/2  thus  forcing  g(0|v^)  to  be  standard  normal. 


Convergence  of  the  Newton-Raphson  algorithm  is  assured  if  H (p)  is  a  negative 
definite  matrix  for  all  p.  Unfortunately,  H(p)  need  not  be  negative  definite  as  the 
following  argument  shows.  Let  B(0)  =  ErB^)  for  arbitrary  constants  r.,  and  let 

hr(^  =  Pr(^)g(^l  ¥>)/  /pr(u)g(u|p)du 

and  nr  =  J  B(  0)hr(  0)dd.  Then  rH(^)r  =  Ewf  f[B{0)  -  ^  ]2hr(0)d0- f  B(0)2g(%)d0. 

There  is  no  theoretical  reason  for  rtH(y?)r  to  be  either  positive  or  negative.  However,  in 
the  test  cases,  H(^^)  appeared  to  be  negative  definite,  and  convergence  was  very  fast  (at 
most  10  steps)  when  p  had  less  than  9  components.  For  severely  overparameterized  cases, 
the  step— halving  modification  to  the  Newton-Raphson  algorithm  greatly  improved 
convergence.  Using  the  results  of  the  next  section,  this  argument  can  be  modified  to  show 
that  H(  ip )  becomes  be  negative  definite  in  a  neighborhood  of  the  maximizing  p  as  n  -  oo  if 
gQ  is  well  approximated  from 

In  the  test  cases  reported  below,  the  support  of  gg  was  assumed  to  be  [a.b]  = 

[-3. 5.3. 5]  following  the  assumption  that  gQ  should  be  roughly  standard  normal.  Simpson's 
rule  with  51  points  was  used  to  approximate  the  integrals,  and  all  computations  were 
performed  in  double  precision  arithmetic  on  an  IBM  4381  computer.  The  spline 


Uo 


10 
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evaluations  were  performed  using  public  domain  FORTRAN  routines  from  de  Boor  ( 1978). 
SPLINT  for  initially  computing  ip^  and  B VALUE  for  all  B— spline  evaluations. 

As  an  experiment,  a  program  was  written  to  implement  the  EM  algorithm  to  obtain 
<p.  Although  essentially  the  same  estimates  were  obtained,  convergence  was  much  slower. 
Our  conclusions  were  that  the  EM  algorithm  could  be  used  if  one  wanted  to  estimate  item 
parameters  along  with  gg  as  in  Bock  and  Aitken  (1981)  or  Rigdon  and  Tsutakawa  (1983), 
but  if  the  primary  purpose  is  to  estimate  gg  with  fixed  item  parameters,  the 
Newton— Raphson  method  is  preferable. 


4.  IDENTIFI ABILITY  AND  CONSISTENCY 


By  the  Strong  Law  of  Large  Numbers, 


where 


=  nr/n  -  jrr, 


111  III 

\  =  yij  =  rl  =  /Pl^  yij  =  rl%o(0)d0 


•  i  ij 

,=i 

-  v 


e  yi/j}/pr(0)gg(W  r  =  0,-  •  -,m 


with  y.  =  Sy...  Thus  in  the  limit,  marginal  mle  is  equivalent  to  maximizing 

j 

M(g)  =  E  Trlogjjpr(0)g(%)d0j 
m 

over  G.  ..  Let  k  =  f  P[  E  y;i  =  x\0\%{Q\<p)<i9  so  that  M(g(0|<p))  =  S  w  logT  +  constant. 
k,  t  r  j  :  i  i  rr 

j=i 

The  information  inequality  (cf.  Rao  (1973).  chapter  5)  states  that  £  /Mogj,  <  £  TMogT. 

* 

with  equality  if  and  only  if  7rr  =  for  r  =  0, •  •  -  .m.  Thus  if  gg(0)  =  g(9\<p  ),  one  might 

expect  <p  to  provide  a  consistent  estimate  of  An  analysis  similar  to  that  of  Section  3 

* 

can  be  used  to  show  that  the  Hessian  matrix  for  M(g(#|<p))  is  nonpositive  definite  at  . 
Unfortunately,  in  the  nonparametric  setting  where  gg  is  completely  unspecified,  gg 


mm. 


may  not  be  identifiable  given  M(g).  To  see  this,  note  that  if  h(  9)  is  a  function  such  that 

J* Pr(  0)h{  0)d9  =  0  for  all  r.  J h( 9)d9  =  0,  and  g  +  h  >  0,  then  (g  +  h)  is  a  density  and  M(g 

+  h)  =  M(g).  It  seems  plausible  that  these  conditions  actually  force  h  to  be  small, 

perhaps  in  some  norm,  if  m  is  moderately  large,  but  we  are  unable  to  prove  this.  Further 

analytical  work  needs  to  be  done  in  this  area. 

Identifiabilitv  did  not  appear  to  be  a  problem  in  the  test  cases  we  studied,  and  we 

conjecture  that  it  is  not  a  problem  in  general  for  tests  of  moderate  length  m.  Intuitively. 

* 

if  gg  is  smooth,  the  Theorem  of  Section  2  guarantees  that  there  is  a  member  g{0\ ip  )  of 
some  equivalence  class  |g:  H(g)  =  H(g)  j  very  close  to  the  true  gg  which  can  be  consistently 
estimated.  We  believe  that  it  should  be  possible  to  prove  identifiabilitv  either  by  letting  m 
-  x  or  by  assuming  a  prior  distribution  on  the  item  parameters. 

It  is  possible  to  carry  out  at  least  a  formal  asymptotic  theory  to  get  approximate 
pointwise  confidence  intervals  (ignoring  bias)  for  gg(p).  Stone  and  Koo  (1986)  did  this  for 
the  ordinary  density  estimation  problem.  The  details  in  the  present  context  will  be 
provided  elsewhere. 

5.  A  LIKELIHOOD  RATIO  TEST  FOR  NORMALITY 

Here  we  focus  on  quadratic  splines  with  k  =  3,  hence  from  (2.4) 

u  9  N— 3  9 

(5.1)  E  aES.(0;t)  =  a,  +  a90  +  oiJr  +  E7u(0-t3+  )“. 

i=l  “  '  u  =  l 

Note  that  is  determined  by  (3.2),  so  that  there  are  actually  N  -  1  free  parameters. 
Moreover,  if  N  =  3,  log(g(0|<p))  is  quadratic  hence  g(0|<p)  is  normal.  If  t  is  fixed  with  N  > 
3  large  enough  so  that  gg  is  well  approximated  from  t,  it  is  reasonable  to  test  that  gg  is 
normal  by  testing  Hgi  =  •  •  •  =  7^  _  ^  =  0.  The  likelihood  ratio  test  statistic  = 
2[!ogL(y\.)  -  logL(^)]  thus  has  an  approximate  chi-square  distribution  with  N  -  3 
degrees  of  freedom  under  Hg. 


A  fundamental  question  here  is  the  proper  choice  of  N.  For  N  too  large,  the 
marginal  mle  will  tend  to  estimate  the  discrete  solution  and  appear  too  rough,  but  if  N  is 
too  small,  g(0|</>)  will  not  be  able  to  differentiate  features  in  the  true  gg.  The  situation  is 
anlaogous  to  ordinary  density  estimation  where  the  proper  "bandwidth"  parameter  must  be 
chosen  (cf.  Silverman,  1986).  From  limited  experience  with  simulated  data,  it  appears  that 
N  between  6  and  8  appears  to  work  well. 

We  have  experimented  with  an  ad  hoc  data-based  approach  to  the  choice  of  N 
based  on  likelihood  ratio  test  statistics.  Define 

=  2(logL(v>N)  -  logL(^  _  j)], 

the  change  in  the  likelihood  ratio  statistic  due  to  the  addition  of  the  Nth  parameter.  As  a 
criterion  for  choosing  N,  we  propose  adding  parameters  until  the  increase  in  A^  is 
negligible.  Since  the  families  t  are  not  nested  as  N  is  increased  when  t  is  defined  by 
(2.3),  the  likelihood  ratio  test  does  not  apply  to  A^.  However,  it  does  appear  that  a 
strategy  of  adding  parameters  until  A^  fails  to  change  appreciably  is  effective  in  choosing 
a  suitable  N.  The  theoretical  implications  of  this  proposal  are  topics  for  further 
investigation. 

While  successive  models  are  not  nested,  there  are  models  which  are.  Suppose  two 
models  have  N  >  N  with  respective  knot  sets  t  and  t.  If  t  C  t.  then  from  (5.1)  the  models 
are  nested,  so  the  likelihood  ratio  test  does  apply.  This  is  always  the  case  when  using  the 
knots  given  by  (2.3)  if  (N  -  1)  divides  (N  —  1). 

6.  SIMULATION  RESULTS 

In  order  to  assess  the  feasibility  of  the  proposed  method,  several  experiments  were 
conducted  with  simulated  data.  The  results  of  two  trials  are  reported  here.  In  both  cases 
the  0.' s  were  chosen  equally  spaced  on  [-1.5, 1.5]  with  0.  =  3(j  -  1 )/( n  -  1)  -  1.5. 

CASE  1.  In  the  first  experiment,  there  were  n  =  2000  subjects,  m  =  20  items,  and  gg  was 
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taken  to  be  standard  normal.  The  results  presented  in  Table  1  clearly  show  that  there  is 
no  evidence  that  gg  is  not  normal.  Figure  1  shows  graphs  of  g(0|^)  for  selected  N.  As  N 
increases,  the  marginal  mle  appears  to  be  converging  to  a  discrete  distribution  with  perhaps 
4  point  masses  at  roughly  —1.8.  —0.7,  0.2,  and  1.2.  This  phenomenon  is  to  be  expected  and 
demonstrates  the  potential  problem  for  overfitting.  On  the  other  hand,  the  A^.  and 
statistics  presented  in  the  table  show  that  there  is  no  reason  based  on  the  data  to  choose  N 
larger  than  3. 

CASE  2.  For  the  second  experiment,  we  again  chose  n  =  2000  and  m  =  20,  but  took  gg  to 
be  a  mixture  of  normal  densities, 

(6.1)  gg (0)  =  .6n(s(0-  a))  +  .4n(s(0  +  1.5a)) 

where  n(0)  =  e  ^^“/(2x)^^,  a  =  0.65,  and  s  =  (1  —  1.5a^)^.  The  constants  a  and  s 
were  chosen  so  that  j"  0gg(0)d0  =  0  and  J  0* gg(0)d#  =  1.  The  results  given  in  Table  2 
show  large  increases  in  when  N  changes  from  3  to  4  and  4  to  5,  but  negligible  changes 
thereafter.  Thus  the  statistics  indicate  that  the  choice  N  =  5  is  best.  Figure  2  giving  the 
plots  of  selected  fits  in  comparison  with  the  true  density  shows  that  N  =  5  is  indeed  a  good 
choice.  Note  that  the  log  likelihood  actually  decreased  in  going  from  N  =  5  to  N  =  6.  This 
is  possible  since  the  two  models  are  not  nested. 

In  Figure  3,  we  attempted  to  see  how  well  the  mixture  density  of  (6.1)  could  be 
approximated  from  fc.  There  are  potentially  many  ways  to  choose  the  parameter  </>  to 
obtain  a  good  fit.  We  decided  to  approximate  the  Kullback-Leibler  distance 

/log(go(0)/g(%))go(0)d0 

by  minimizing 

/{log(go(0»  -s(%)}2go(0)d0 

subject  to  j' es^d$  =  1.  Selected  fits  are  shown  in  Figure  3.  It  is  obvious  that  the  bias  in 
this  example  from  approximating  gg  from  t  is  negligible  for  N  >  5. 


'•>y 


7.  CONCLUSIONS 


I 


In  this  report  the  log  spline  method  of  Stone  and  Koo  is  used  to  estimate  the  latent 
trait  density  in  item  response  models  in  which  the  item  parameters  are  assumed  known. 
Despite  the  identifiability  problem  inherent  in  the  model,  we  show  that  consistent 
estimates  of  good  approximations  to  the  unknown  density  are  possible.  In  addition,  a 
likelihood  ratio  test  for  normality  is  introduced.  Several  examples  are  given  which  show 
that  the  method  can  work  well  in  practice,  both  for  testing  that  a  density  is  normal  and  for 
estimating  the  density  in  a  test  case  where  it  is  not  normal.  In  addition,  a  heuristic 
method  is  proposed  for  choosing  the  appropriate  number  of  parameters  to  put  into  the 
model. 

The  results  here  extend  density  estimation  methodology  to  a  case  where  the  random 
variables,  namely  latent  traits,  are  unobservable.  The  method  as  proposed  is  primarily  a 
data-analytic  tool  rather  than  an  inferential  one,  and  the  intended  purpose  is  to  provide 
researchers  with  information  similar  to  that  obtained  from  histograms  for  usual  observable 
data.  The  problem  of  choosing  suitable  N  is  analogous  to  that  of  choosing  the  correct 
number  of  bins  for  a  histogram.  A  nonnormal  ability  density  could  indicate  a  mixture  of 
populations  as  in  the  second  example,  or  it  could  be  a  sign  of  multidimensionality.  In 
either  case,  a  visual  realization  of  the  density  could  be  a  useful  exploratory  tool. 
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Table  1 


Case  1:  Standard  normal  density 


0.091 

0.300 

1.686 

1.967 

4.286 

4.239 

6.620 


Table  2 


Case  2:  Mixture  density  (6.1) 


10.213 

27.100 

26.930 

27.370 

27.433 

28.140 

28.217 


0.091 

0.209 

1.386 

0.281 

2.282 

0.375 

2.000 


10.213 

16.887 

-0.169 

0.439 

0.064 

0.706 

0.077 
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